-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locally built whl files are not reproducible #154
Comments
I have just now encountered the same issue while trying to get our python test to work with the remote cache. I wrote all the info in this thread: https://groups.google.com/forum/#!topic/bazel-sig-python/0leDk_aD8FM Fixing this would be great, but in the case of the PyYAML module it is apparently not enough to fix the caching. |
@PoncinMatthieu i tried to reproduce it with pyyaml but it seems to work, maybe some sort of a timestamp in your case? Anyhow, regarding the solution to the build dir problem, I also tried to use "-b", os.getcwd(), and it seems to work correctly. os.getcwd() points to the place where the resulting wheels are stored. Might that be a solution for our problem here? |
@ltekieli Thanks a lot for trying it out with pyyaml! |
I have noticed that for some packages like "apache-airflow", we also need to upgrade pip tools to the latest versions to get deterministic builds. I believe this is what we are missing with the current version of pip: https://bitbucket.org/pypa/wheel/pull-requests/47/make-metadata-generation-deterministic/diff |
Lastly, I found out that some specific modules are not reproducible due to a setting "optimize" which includes compiled files to the build. This can be tested on the same repo by adding the requirement |
One thing I also noticed is that setting an env var |
…4\#issue-396782772. Attempting to get reproducible .so files in whls
I'm debugging this same issue and unfortunately even having I can do two Running
I'd be interested to know where that tmp dir is being set. Note: We're using |
It's coming from here: |
@brandjon What do you think about making this a P1? Deterministic builds for caching is pretty central to Bazel's value proposition. Unless there's a known mitigation for this problem? |
I think its possible to get the tests caching by tweaking some environment variables. https://github.com/ltekieli/rules_python_bug Added the environment variables:
|
rules_python 0.1.0 has been released which upstreams the rules_python_external repo. Please switch from |
Pip uses a temporary directory as the build directory for system specific libraries (eg. psutil), this information gets transferred to the resulting *.so files which causes each call of pip to produce a different *.so file.
The following repository illustrates the issue https://github.com/ltekieli/rules_python_bug.
It uses --disk_cache to specify a common cache between runs. Each run is done in a new docker container instance.
Buggy output is:
In the second run the test result should be taken from the cache, but due to the nondeterministic whl it is not.
Applying the following patch solves this issue and the test is properly taken from the cache.
And correct output:
The problem with this solution is that it might happen that parallel executions of pip write to the same build directory when tmp is not sandboxed by bazel. I'm not sure how to solve this properly yet.
The text was updated successfully, but these errors were encountered: